AITopics | kl loss

Collaborating Authors

kl loss

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

87ee1bbac4635e7c948f3eea83c1f262-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 10:16:35 GMT

artificial intelligence, machine learning, robustness, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

9df81829c4ebc9c427b9afe0438dce5a-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-9-2026, 13:54:47 GMT

began, latent code, modification, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.30)

Add feedback

The power of absolute discounting: all-dimensional distribution estimation

Moein Falahatgar, Mesrob I. Ohannessian, Alon Orlitsky, Venkatadheeraj Pichapati

Neural Information Processing SystemsNov-21-2025, 06:52:09 GMT

Neural Information Processing Systems http://nips.cc/

absolute discounting, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Wayne County > Detroit (0.04)
North America > United States > Maryland (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Iraq > Baghdad Governorate > Baghdad (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Decoupled Kullback-Leibler Divergence Loss

Neural Information Processing SystemsOct-10-2025, 08:39:03 GMT

Firstly, we address the limitation of KL/DKL in scenarios like knowledge distillation by breaking its asymmetric optimization property.

knowledge distillation, optimization, robustness, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Rethinking KL Regularization in RLHF: From Value Estimation to Gradient Optimization

Liu, Kezhao, Liu, Jason Klein, Chen, Mingtao, Liu, Yiming

arXiv.org Artificial IntelligenceOct-7-2025

Reinforcement Learning from Human Feedback (RLHF) leverages a Kullback-Leibler (KL) divergence loss to stabilize training and prevent overfitting. However, in methods such as GRPO, its implementation may be guided by principles from numerical value estimation-a practice that overlooks the term's functional role as an optimization loss. To analyze this issue, we establish a unified framework that connects two seemingly distinct implementation styles: using the mathematical term $k_n$ as a detached coefficient for the policy's score function ('$k_n$ in reward') or as a direct loss function through which gradients are propagated ('$k_n$ as loss'). We show that the latter can always be analyzed via an equivalent gradient coefficient in the former, unifying the two perspectives. Through this framework, we prove that the conventional '$k_1$ in reward' (like in PPO) is the principled loss for Reverse KL (RKL) regularization. We further establish a key finding: under on-policy conditions, the '$k_2$ as loss' formulation is, in fact, gradient-equivalent to '$k_1$ in reward'. This equivalence, first proven in our work, identifies both as the theoretically sound implementations of the RKL objective. In contrast, we show that the recently adopted '$k_3$ as loss' (like in GRPO) is merely a first-order, biased approximation of the principled loss. Furthermore, we argue that common off-policy implementations of '$k_n$ as loss' methods are biased due to neglected importance sampling, and we propose a principled correction. Our findings provide a comprehensive, gradient-based rationale for choosing and correctly implementing KL regularization, paving the way for more robust and effective RLHF systems.

artificial intelligence, coefficient, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.01555

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

We thank the reviewers for their insights. We summarize the overall response as positive. We address the concerns in the order received. There might be a misunderstanding here. The sole concern of R1 is that our model is an "extension of the idea in " However, the provided references [1, 2] do not show that BEGAN could do such a thing.

author response, deep automodulator, latent code, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.30)

Add feedback

Generalized Kullback-Leibler Divergence Loss

Cui, Jiequan, Zhu, Beier, Xu, Qingshan, Tian, Zhuotao, Qi, Xiaojuan, Yu, Bei, Zhang, Hanwang, Hong, Richang

arXiv.org Artificial IntelligenceMar-11-2025

In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of (1) a weighted Mean Square Error (wMSE) loss and (2) a Cross-Entropy loss incorporating soft labels. Thanks to the decoupled structure of DKL loss, we have identified two areas for improvement. Firstly, we address the limitation of KL loss in scenarios like knowledge distillation by breaking its asymmetric optimization property along with a smoother weight function. This modification effectively alleviates convergence challenges in optimization, particularly for classes with high predicted scores in soft labels. Secondly, we introduce class-wise global information into KL/DKL to reduce bias arising from individual samples. With these two enhancements, we derive the Generalized Kullback-Leibler (GKL) Divergence loss and evaluate its effectiveness by conducting experiments on CIFAR-10/100, ImageNet, and vision-language datasets, focusing on adversarial training, and knowledge distillation tasks. Specifically, we achieve new state-of-the-art adversarial robustness on the public leaderboard -- RobustBench and competitive knowledge distillation performance across CIFAR/ImageNet models and CLIP models, demonstrating the substantial practical merits. Our code is available at https://github.com/jiequancui/DKL.

distillation, knowledge distillation, robustness, (17 more...)

arXiv.org Artificial Intelligence

2503.08038

Country: